Fail HTLCs from late counterparty commitment updates in ChannelMonitor by joostjager · Pull Request #4434 · lightningdevkit/rust-lightning

joostjager · 2026-02-23T09:45:12Z

Builds on top of #4351

ldk-reviews-bot · 2026-02-23T09:45:15Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

codecov · 2026-02-23T12:51:54Z

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.97%. Comparing base (ec03159) to head (da66b8e).

Files with missing lines	Patch %	Lines
lightning/src/chain/channelmonitor.rs	83.33%	0 Missing and 1 partial ⚠️
lightning/src/ln/channel.rs	94.44%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4434      +/-   ##
==========================================
+ Coverage   85.93%   85.97%   +0.03%     
==========================================
  Files         159      159              
  Lines      104693   104712      +19     
  Branches   104693   104712      +19     
==========================================
+ Hits        89972    90022      +50     
+ Misses      12213    12187      -26     
+ Partials     2508     2503       -5

Flag	Coverage Δ
tests	`85.97% <93.54%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wpaulino · 2026-02-23T20:02:37Z

lightning/src/ln/channelmanager.rs

-				chan.force_shutdown(reason)
+				let in_flight = in_flight_monitor_updates
+					.get(&chan_id)
+					.map(|(_, updates)| updates.iter().collect::<Vec<_>>())


Do we really need this allocation? Can we just pass the iterator to force_shutdown instead?

Good one, fixed

ldk-reviews-bot · 2026-02-23T20:02:42Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

TheBlueMatt

Hmmmmmmmmm, its really not quite so simple. If a monitor update is InProgress we generally expect the in-memory ChannelMonitor to have actually seen the new state update (though we expect that on restart there's a good chance it won't have that information and it'll be replayed). That means the monitor is theoretically "in charge" of resolving the HTLCs in question. While I think this patch is correct, I'm a bit hesitant to end up with two things in charge of resolving a specific HTLC. ISTM it also wouldn't be so complicated to immediately mark HTLCs as failed in the monitor upon receiving updates that try to add new HTLCs after a channel has been closed.

joostjager · 2026-02-24T07:32:55Z

ISTM it also wouldn't be so complicated to immediately mark HTLCs as failed in the monitor upon receiving updates that try to add new HTLCs after a channel has been closed.

Isn't the problem with this that the monitor can't see whether the commitment was already sent to the peer or blocked by async persistence?

TheBlueMatt · 2026-02-24T12:55:21Z

Right, the monitor has to assume it may have been sent to the peer. That's fine, though, it should be able to fail the HTLC(s) once the commitment transaction is locked.

joostjager · 2026-02-24T15:20:57Z

Hmm, it seems that the original issue isn't an issue then. I confirmed that mining the force close commitment tx indeed generates the payment failed event. Or is there still something missing?

TheBlueMatt · 2026-02-24T16:22:50Z

In the general case (ChannelMonitor in-memory in the same machine that runs ChannelManager) I believe there is not a runtime issue, no. I was thinking you'd identified an issue in the crash case, though (where we crash without persisting the monitor update and on restart lose it) but actually I'm not sure this is any different from existing payments logic - if you start a payment and nothing persists on restart its expected to be gone and downstream code has to handle that.

joostjager · 2026-02-24T16:38:20Z

The one that I identified wasn't a crash case. An assertion in the fuzzer identified that a payment was lost, but with current understanding I think it's just that the fuzzer didn't push the force-close forward enough (mine). Will close it for now and can look again if the payment remains lost even with confirmation of the commit tx. Unit test did confirm that it should just work.

joostjager · 2026-03-02T08:07:16Z

Reopening as this fix is now required for #4351.

joostjager · 2026-03-02T08:10:56Z

Hmmmmmmmmm, its really not quite so simple. If a monitor update is InProgress we generally expect the in-memory ChannelMonitor to have actually seen the new state update (though we expect that on restart there's a good chance it won't have that information and it'll be replayed). That means the monitor is theoretically "in charge" of resolving the HTLCs in question. While I think this patch is correct, I'm a bit hesitant to end up with two things in charge of resolving a specific HTLC. ISTM it also wouldn't be so complicated to immediately mark HTLCs as failed in the monitor upon receiving updates that try to add new HTLCs after a channel has been closed.

Now that we've decided that for deferred writes we no longer want to expect the in-memory ChannelMonitor to have seen the changes, I think the above doesn't apply anymore, and we might have to accept that two things are in charge for resolving a specific HTLC.

joostjager · 2026-03-02T08:13:29Z

lightning/src/ln/channel.rs

 			if let OutboundHTLCState::LocalAnnounced(_) = htlc.state {
-				for update in self.blocked_monitor_updates.iter() {
-					for update in update.update.updates.iter() {
+				let blocked_iter = self.blocked_monitor_updates.iter().map(|u| &u.update);


I do wonder if it is also possible for outbound htlcs to have already progressed to the Committed or later states? In that case, we might also want to catch that here.

joostjager · 2026-03-02T08:22:35Z

lightning/src/chain/channelmonitor.rs

+			events.pop().unwrap(),
+			Event::ChannelClosed { reason: ClosureReason::CommitmentTxConfirmed, .. }
+		));
+		expect_payment_failed_conditions_event(


This now no longer needs to wait for the force close to confirm.

TheBlueMatt

Now that we've decided that for deferred writes we no longer want to expect the in-memory ChannelMonitor to have seen the changes, I think the above doesn't apply anymore, and we might have to accept that two things are in charge for resolving a specific HTLC.

I don't think so. Instead, what we'll need to do is detect a new local counterparty state ChannelMontiorUpdate after the channel has close and track it to fail the HTLCs from it once the commitment transaction has 6 confs.

TheBlueMatt · 2026-03-02T16:15:38Z

Oh actually no I don't think we need to wait for 6 confs since we never sent the message, ie we can fail it immediately.

joostjager · 2026-03-03T10:29:58Z

Oh actually no I don't think we need to wait for 6 confs since we never sent the message, ie we can fail it immediately.

Does the monitor know that a message was never sent? #4434 (comment) suggests that we should assume we did send.

I steered Claude to a potential fix: main...joostjager:rust-lightning:chain-mon-internal-deferred-writes-fix-contract. I based it on the deferred writes branch so that tests can get delayed in-memory updates.

Is this the direction you have in mind? It is unfortunate that we need to add this extra logic to a system that is already complicated.

TheBlueMatt · 2026-03-03T16:00:04Z

I steered Claude to a potential fix: main...joostjager:rust-lightning:chain-mon-internal-deferred-writes-fix-contract. I based it on the deferred writes branch so that tests can get delayed in-memory updates.

Seems weird to do this based on the monitor data rather than build the to-fail list from the monitor update.

joostjager · 2026-03-04T11:47:55Z

Seems weird to do this based on the monitor data rather than build the to-fail list from the monitor update.

Moved branch into this PR and updated to use more data from the monitor update. Still need to fetch monitor data too.

The state space for this fix is large: it's the cross product of funding spend confirmation status (pending vs confirmed), which transaction confirmed (counterparty vs current/previous holder commitment), whether each HTLC is non-dust in the confirmed commitment, whether it was already fulfilled via preimage, and whether it was already failed through a previously-known commitment (with the prior failure potentially still pending, already matured, or fully consumed by the ChannelManager). I'm not fully confident that every combination is handled correctly.

The previous wording implied that persisting a full ChannelMonitor would automatically resolve all pending updates. Reword to make clear that each update ID still needs to be individually marked complete via channel_monitor_updated, even after a full monitor persistence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract the ChannelMonitor construction boilerplate from channelmonitor test functions into a reusable dummy_monitor helper in test_utils.rs, generic over the signer type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pure refactor: move the bodies of Watch::watch_channel and Watch::update_channel into methods on ChainMonitor, and have the Watch trait methods delegate to them. This prepares for adding deferred mode where the Watch methods will conditionally queue operations instead of executing them immediately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a `deferred` parameter to `ChainMonitor::new` and `ChainMonitor::new_async_beta`. When set to true, the Watch trait methods (watch_channel and update_channel) will unimplemented!() for now. All existing callers pass false to preserve current behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the unimplemented!() stubs with a full deferred write implementation. When ChainMonitor has deferred=true, Watch trait operations queue PendingMonitorOp entries instead of executing immediately. A new flush() method drains the queue and forwards operations to the internal watch/update methods, calling channel_monitor_updated on Completed status. The BackgroundProcessor is updated to capture pending_operation_count before persisting the ChannelManager, then flush that many writes afterward - ensuring monitor writes happen in the correct order relative to manager persistence. Key changes: - Add PendingMonitorOp enum and pending_ops queue to ChainMonitor - Implement flush() and pending_operation_count() public methods - Integrate flush calls in BackgroundProcessor (both sync and async) - Add TestChainMonitor::new_deferred, flush helpers, and auto-flush in release_pending_monitor_events for test compatibility - Add create_node_cfgs_deferred for deferred-mode test networks - Add unit tests for queue/flush mechanics and full payment flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a ChannelMonitorUpdate containing a new counterparty commitment is dispatched (e.g. via deferred writes) before a channel force-closes but only applied to the in-memory monitor after the commitment transaction has already confirmed on-chain, the outbound HTLCs in that update must be failed back. Add fail_htlcs_from_update_after_funding_spend to ChannelMonitorImpl which detects this race condition during update_monitor. When a LatestCounterpartyCommitmentTXInfo or LatestCounterpartyCommitment update is applied and the funding output has already been spent, the function creates OnchainEvent::HTLCUpdate entries for HTLCs that are not present as non-dust outputs in the confirmed commitment transaction. These entries mature after ANTI_REORG_DELAY blocks, giving time for the peer to potentially broadcast the newer commitment. HTLCs that appear as non-dust outputs in the confirmed commitment (whether counterparty or holder) are skipped, as they will be resolved on-chain via the normal HTLC timeout/success path. HTLCs already fulfilled by the counterparty (tracked in counterparty_fulfilled_htlcs) are also skipped. Duplicate failures from previously-known counterparty commitments are handled gracefully by the ChannelManager. AI tools were used in preparing this commit.

joostjager · 2026-03-04T13:06:32Z

Given that this is now based on #4351 (to make testing easier), we could sequence as follows: review this PR, then merge 4351, then merge this one.

TheBlueMatt · 2026-03-04T15:22:17Z

lightning/src/chain/channelmonitor.rs

+		// race where a monitor update is dispatched before the channel force-closes but only
+		// applied after the commitment transaction confirms.
+		for update in updates.updates.iter() {
+			let htlcs: Vec<(&HTLCSource, PaymentHash, u64)> = match update {


No need to Vec/collect here, just keep it as an iterator.

TheBlueMatt · 2026-03-04T15:25:00Z

lightning/src/chain/channelmonitor.rs

+			// Skip HTLCs that appear as non-dust outputs in the confirmed commitment.
+			let in_confirmed_commitment = confirmed_commitment_htlcs.iter().any(|htlc| {
+				htlc.transaction_output_index.is_some()
+					&& htlc.payment_hash == payment_hash


No, we have to match by source, not payment_hash + amount.

TheBlueMatt · 2026-03-04T15:27:38Z

lightning/src/chain/channelmonitor.rs

+				waiting for confirmation (at height {})",
+				entry.confirmation_threshold()
+			);
+			self.onchain_events_awaiting_threshold_conf.push(entry);


Not actually sure we need to push this as an event-pending-confirmation (in theory we refused to let the manager send out the new commitment so we could fail immediately), but it seems marginally safer and I don't see a reason why it won't work (if we get a reorg we have the new remote commitment stored anyway so we'll handle it correctly then). 👍

TheBlueMatt · 2026-03-04T15:30:08Z

lightning/src/chain/channelmonitor.rs

+
+		// Collect the confirmed commitment's non-dust HTLCs so we can skip HTLCs that have
+		// on-chain outputs (those will be resolved via the normal HTLC timeout/success path).
+		let confirmed_commitment_htlcs: Vec<_> = if let Some(htlc_list) =


Here too we should be able to avoid a Vec and just store an iterator.

TheBlueMatt · 2026-03-04T15:31:10Z

lightning/src/chain/channelmonitor.rs

+		// Collect the confirmed commitment's non-dust HTLCs so we can skip HTLCs that have
+		// on-chain outputs (those will be resolved via the normal HTLC timeout/success path).
+		let confirmed_commitment_htlcs: Vec<_> = if let Some(htlc_list) =
+			self.funding.counterparty_claimable_outpoints.get(&spending_txid)


Probably simpler to just always check both prev counterparty commitment and both local commitments. This will avoid issues when we add counterparty_claimable_outpoints offloading.

Yes, but I don't think the counterparty commitment tx is available? update_counterparty_commitment_data seems to not store it, and puts the htlc in counterparty_claimable_outpoints.

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from 2d36c03 to cce8ccb Compare February 23, 2026 10:08

joostjager marked this pull request as ready for review February 23, 2026 13:49

joostjager requested a review from wpaulino February 23, 2026 17:34

wpaulino reviewed Feb 23, 2026

View reviewed changes

wpaulino requested a review from TheBlueMatt February 23, 2026 20:03

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from cce8ccb to ad3036e Compare February 23, 2026 21:11

TheBlueMatt reviewed Feb 23, 2026

View reviewed changes

joostjager closed this Feb 24, 2026

This was referenced Feb 24, 2026

Payment silently stuck when channel force-closed during in-progress monitor update #4431

Closed

Defer ChainMonitor updates and persistence to flush() #4351

Open

joostjager reopened this Mar 2, 2026

joostjager commented Mar 2, 2026

View reviewed changes

joostjager requested a review from TheBlueMatt March 2, 2026 08:13

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch 2 times, most recently from 6421a28 to 8dac2fa Compare March 2, 2026 08:20

joostjager commented Mar 2, 2026

View reviewed changes

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from 8dac2fa to da66b8e Compare March 2, 2026 08:23

TheBlueMatt reviewed Mar 2, 2026

View reviewed changes

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from da66b8e to ea72576 Compare March 4, 2026 11:37

joostjager changed the title ~~Check in-flight monitor updates in force_shutdown for LocalAnnounced HTLCs~~ Fail HTLCs from late counterparty commitment updates in ChannelMonitor Mar 4, 2026

joostjager and others added 5 commits March 4, 2026 13:22

Extract shared dummy_monitor helper to test_utils

705d1fe

Extract the ChannelMonitor construction boilerplate from channelmonitor test functions into a reusable dummy_monitor helper in test_utils.rs, generic over the signer type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from ea72576 to 79bc75e Compare March 4, 2026 12:22

joostjager force-pushed the fix-force-close-in-progress-monitor-drops-htlc branch from 79bc75e to 1e37fd6 Compare March 4, 2026 12:29

joostjager requested a review from TheBlueMatt March 4, 2026 13:11

TheBlueMatt reviewed Mar 4, 2026

View reviewed changes

TheBlueMatt added this to the 0.3 milestone Mar 4, 2026

Conversation

joostjager commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wpaulino Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented Feb 23, 2026

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

joostjager commented Feb 24, 2026

Uh oh!

TheBlueMatt commented Feb 24, 2026

Uh oh!

joostjager commented Feb 24, 2026

Uh oh!

TheBlueMatt commented Feb 24, 2026

Uh oh!

joostjager commented Feb 24, 2026

Uh oh!

joostjager commented Mar 2, 2026

Uh oh!

joostjager commented Mar 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Mar 2, 2026

Uh oh!

joostjager commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt commented Mar 3, 2026

Uh oh!

joostjager commented Mar 4, 2026

Uh oh!

joostjager commented Mar 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joostjager commented Feb 23, 2026 •

edited

Loading

ldk-reviews-bot commented Feb 23, 2026 •

edited

Loading

codecov bot commented Feb 23, 2026 •

edited

Loading

wpaulino Feb 23, 2026 •

edited

Loading

joostjager commented Mar 3, 2026 •

edited

Loading